Learning to Simplify Sentences Using Wikipedia
نویسندگان
چکیده
In this paper we examine the sentence simplification problem as an English-to-English translation problem, utilizing a corpus of 137K aligned sentence pairs extracted by aligning English Wikipedia and Simple English Wikipedia. This data set contains the full range of transformation operations including rewording, reordering, insertion and deletion. We introduce a new translation model for text simplification that extends a phrasebased machine translation approach to include phrasal deletion. Evaluated based on three metrics that compare against a human reference (BLEU, word-F1 and SSA) our new approach performs significantly better than two text compression techniques (including T3) and the phrase-based translation system without deletion.
منابع مشابه
Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
Text simplification aims to rewrite text into simpler versions, and thus make information accessible to a broader audience. Most previous work simplifies sentences using handcrafted rules aimed at splitting long sentences, or substitutes difficult words using a predefined dictionary. This paper presents a datadriven model based on quasi-synchronous grammar, a formalism that can naturally captur...
متن کاملDocument Summarization using Wikipedia
© Document summarization using Wikipedia Krishnan Ramanathan, Yogesh Sankarasubramaniam, Nidhi Mathur, Ajay Gupta HP Laboratories HPL-2009-39 Single Document Summarization, Wikipedia, ROUGE Although most of the developing world is likely to first access the Internet through mobile phones, mobile devices are constrained by screen space, bandwidth and limited attention span. Single document summa...
متن کاملExperimental evaluation of learning performance for exploring the shortest paths in hyperlink network of Wikipedia
In a 9-hour experiment we evaluated learning performance based on exploring the shortest paths in hyperlink network of Wikipedia online encyclopedia. Relying on network of 35688 unique hyperlinks in three separate learning sessions of 20 minutes students read series of 62 sentences built by using 22 unique hyperlinks that form the eleven shortest paths and answered pre-test and post-test multip...
متن کاملLearning to Identify Definitions using Syntactic Features
This paper describes an approach to learning concept definitions which operates on fully parsed text. A subcorpus of the Dutch version of Wikipedia was searched for sentences which have the syntactic properties of definitions. Next, we experimented with various text classification techniques to distinguish actual definitions from other sentences. A maximum entropy classifier which incorporates ...
متن کاملGrammar frequency and simplification: when intuition fails
We investigate whether a medical writer can simplify text by only changing the grammatical structure. Based on a user study, we find that while the sentences look simpler after simplification, they are not easier to understand. For grammatical simplification, better tools are needed to provide more concrete guidance and feedback. Introduction Providing text to patients and health information co...
متن کامل